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FOREWORD 


This  report  is  concerned  with  the  development  of  the  Index  of  Electrorif 
Equipment  Operability.  It  is  one  of  five  related  documents.  The  Instruction 
Manual .  Data  Store,  and  Evaluation  Booklet  are  required  for  evaluating  equip¬ 
ment.  The  Sample  Equipment  Evaluations  report  contains  detailed  evaluations 
of  four  equipments,  including  recommendations.  This  work  was  performed  urt'. 
Contract  No.  0A-36-O39"SC-8O555  for  the  Electronic  Warfare  Department,  Unites 
States  Army  Electronic  Proving  Ground,  Ft.  Huachuca,  Arizona.  Mr.  James  J. 
Edwards  and  Walter  Bonham  served  as  Technical  Representatives  of  the  Contra^.t 
ing  Officer,  and  provided  continuing  support  during  the  conduct  of  the  study. 
Mr.  Paul  Lamb,  Electronic  Warfare  Department,  and  Mr.  Jeff  Abraham,  Signal 
Communications  Department,  were  of  considerable  assistance,  serving  as  eval¬ 
uators  during  the  tryout  of  the  index.  The  authors  are  also  indebted  to 
numerous  personnel  of  the  American  Institute  for  Research,  but  especially  to 
Mr.  Manus  R.  Munger  for  his  critical  review  and  general  contribution  to  the 
preparation  of  the  Instruction  Manual,  and  to  Mr.  Robert  W.  Smith  and  Mrs. 
Sara  J.  Munger,  members  of  the  project  staff,  for  their  over-all  support  and 
contribution  throughout  all  phases  of  the  effort. 
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PURPOSE 


The  Signal  Corps  has  long  been  aware  of  the  need  for  evaluation  of  its 
equipment.  The  evaluation  of  hardware  has  become  integral  to  the  development 
process,  and  the  recent  past  has  shown  the  development  and  implementation  of 
a  number  of  general  procedures  which  essentially  evaluate  the  human  component 
of  systems.  However,  electronic  equipment  has  become  so  complex  in  recent 
years  that  general,  informal  evaluation  procedures  are  no  longer  feasible. 

The  nature,  number,  and  inter-relationships  of  factors  prevent  adequate  over¬ 
all  evaluations,  and  the  tolerances  of  existing  and  future  equipments  demand 
quantitative  information  which  such  procedures  cannot  provide.  Recently,  a 
quantitative  procedure  for  the  evaluation  of  electronic  equipment  maintain¬ 
ability  (Hunger  £■  Willis,  1959)  was  developed  and  implemented.  The  purpose 
of  the  present  study  is  to  provide  a  procedure  for  the  early,  quantitative 
evaluation  of  electronic  equipment  ooerabi 1 i tv. 

Previous  Efforts 

In  recent  years  there  have  been  numerous  attempts  to  assess  the  human 
element  of  systems.  Handbooks  and  guides  to  aid  design  engineers  have  been 
prepared  in  considerable  numbers  (Baker  £•  Grether,  195^1  Ely,  Thompson,  & 
Orlansky,  1956;  Van  Cott  &  Altman,  1956;  Folley  &  Altman,  1956;  Altman,  et  al, 
1961).  Similarly,  there  has  been  a  considerable  effort  to  develop  human  factors 
checklists  for  use  during  evaluation  (Krumm  6-  Kirchner,  1956;  Berkun  £•  Van  Cott, 
1956;  Van  Cott,  1956;  Fitzpatrick,  1955;  etc.).  While  these  procedures  ful¬ 
filled  a  known  need,  they  are  generally  inadequate  for  purposes  of  evaluation. 
Although  the  information  presented  is  often  based  upon  experimental  comparisons, 
there  is  no  way  of  knowing  the  consequences  on  actual  performance. 

Also,  as  pointed  out  by  Shapero  6-  Bates  (1959),  "it  has  been  difficult 
to  integrate  the  human  elements  with  the  rest  of  the  weapon  system."  Shapero 
£■  Bates  develop  a  "system  analysis  and  Integration  model"  to  overcome  this 
difficulty.  In  a  sense,  they  achieve  their  purpose.  There  model  does  integrate 
the  human  element  with  the  remainder  of  a  system,  but  in  a  qualitative  manner. 


That  is,  although  they  can  demonstrate  the  Interaction  of  the  human  element 
with  all  aspects  of  the  system,  their  scheme  does  not  provide  information 
about  the  consequences  of  the  interaction.  The  chief  difficulty  lies  in  the 
lack  of  comparable  data  about  human  and  other  system  elements.  The  per¬ 
formance  of  other  elements  is  generally  well  known,  and  quantified.  The 
performance  of  the  human  element  is  generally  neither. 

In  the  past  few  years,  there  have  been  several  attempts  at  quantifying 
the  human  element.  These  efforts  are  typified  by  the  work  of  Williams  (l‘'T’ 
Kaufman,  Oehrlein,  &  Kaufman  (1961).  and  Siegel  £•  Wolf  (I96I).  While  notab’ 
in  concept,  these  procedures  are  either  too  gross,  or  require  information  ir 
is  generally  not  available  at  the  time  of  evaluation. 

Williams  has  proposed  a  human  reliability  evaluation  procedure  based  or 
equipment  reliability  assessment  procedures.  However,  the  reliability  four 
employed  in  his  procedure  are  estimates  to  be  made  by  the  evaluator.  It  i;' 
thus  doubtful  that  two  independent  evaluations  of  the  same  equipment  would  t 
similar.  Kaufman,  Oehrlein,  &  Kaufman  have  based  their  procedure  on  easily 
available  information.  Here,  human  reliability  is  related  to  such  factors  r 
volume,  cost,  and  weight  of  the  equipment  to  be  operated.  The  assumption  tl 
these  factors  accurately  reflect  design  sophistication  seems  questionable. 
And  the  further  assumption  that  design  sophistication  is  directly  related  tc 
human  reliability  seems  untenable  if  field  operation  is  the  criteria. 

The  computer  simulation  approach  proposed  by  Siegel  &  Wolf  is  a  unique 
attempt  to  integrate  a  notion  of  performance  consequences  with  other  system 
considerations.  However,  their  concern  is  with  the  determination  of  "operai 
overloading"  based  upon  estimates  of  human  performance  time  and  error.  That 
is,  if  performance  time  and/or  errors  are  in  excess  of  tolerances,  then  the 
operator  is  overloaded.  For  this  purpose,  the  model  is  appropriate.  It  cos 
tains  some  useful  notions  for  purposes  of  evaluation,  but  in  itself  is  not 
appropriate  as  a  general  evaluation  technique. 

Problem 

In  view  of  the  general  purpose  of  the  study,  and  the  experiences  of  oti 
the  primary  problem  here  was  to  develop  an  evaluation  procedure  which  provii 
quantitative  information  related  specifically  to  operator  performance.  It  v 
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planned  that  the  procedure  should  be  applicable  during,  or  prior  to,  acceptance 
testing.  Objectives  for  the  procedure  were  to: 

1.  Predict  the  time  and  reliability  (accuracy)  of  operator 
performance. 

2.  Identify  specific  design  features  which  degrade  operator 
performance. 

3.  Provide  general  guidance  concerning  selection  and  training 
of  operators  for  evaluated  equipments. 

Requirements  for  an  Operabi I i tv  Evaluation  Procedure 

Criteria  which  guided  development  of  the  Index  were: 

1.  Meaningful  ness.  Results  should  be  in  terms  such  as  speed  and 
accuracy  of  performance  which  are  directly  meaningful  rather 
than  in  indirect  measures  that  would  require  considerable  in¬ 
terpretation. 

2.  Specificity,  The  specific  design  features  and  aspects  of  per¬ 
formance  contributing  to  operational  complexity  should  be  made 
explicit  in  the  evaluation  process,  resulting  in  a  diagnostic 
as  well  as  an  over-all  evaluation  tool. 

3.  Obiectivi tv.  Sufficient  guidance  should  be  provided  to  permit 
exactly  the  same  evaluation  results  by  independent  evaluators, 
within  the  limits  imposed  by  their  irreducible  observational 
and  judgmental  differences.  On  the  other  hand,  the  evaluator 
should  be  permitted  to  note  any  instances  in  which  he  feels 
the  formal  evaluation  procedure  is  incomplete  or  would  be  mi s- 
leading  wi thout  specific  interpretation. 

4.  Comp rehens iveness.  Although  there  is  no  way  to  guarantee  that 
every  important  factor  will  be  assessed,  every  factor  of  known 
importance  should  be  included  in  the  procedure. 

5.  Ease  of  use.  Every  effort  should  be  expended  to  make  applica¬ 
tion  of  the  evaluation  procedure  as  simple  and  straightforward 
as  possible  through  the  preparation  of  guidance  materials  and 
forms. 


It  was  felt  that  if  the  above  criteria  were  considered  throughout  the 
developmental  process,  there  would  be  maximum  likelihood  that  the  resulting 
procedure  would  be  both  reliable  and  valid.  By  reliability  in  this  context 
is  meant  the  extent  to  which  results  of  independent  evaluations  for  the  same 
item  of  equipment  will  be  similar.  By  validity  is  meant  the  degree  to  which 
evaluation  results  will  accurately  predict  actual  operator  performance.  Al¬ 
though  there  was  only  very  limited  opportunity  within  the  scope  of  this  proje 
to  study  reliability  or  validity  through  empirical  tryout,  preliminary  resu’. 
are  sufficiently  promising  to  suggest  further  study  and  refinement. 
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CONCEPTUAL  APPROACH 


The  Problem 

The  central  problem  in  the  achievement  of  the  above  objectives  was  the 
development  of  a  conceptual  framework  ivhtch  would  enable  the  use  of  ejJstirsg 
experimental  data.  This  problem  had  two  major  implications: 

1.  Since  the  bulk  of  human  engineering  experimentation  is  con¬ 
cerned  with  the  details  of  hardware  design,  any  basic  frame¬ 
work  used  in  iveiopment  of  the  evaluation  procedure  must 
include  a  breakdown  of  general  handware  categories  and 
characteristics. 

2.  Although  the  ultimate  goal  is  to  predict  mission  performance, 
there  are  no  human  performance  data  available  at  this  level, 
and  little  if  any  usable  data  even  at  the  mission  phase  or 
task  level.  Consequently,  the  finest  unit  ^or  which  reason¬ 
able  performance  data  can  be  established  is  the  individual 
step,  act,  or  behavior* 

Aspects  of  Behavior 

Even  for  the  Individual  task  step  or  behavior,  the  available  performance 
data  are  generally  not  appropriate.  In  order  to  organize  the  existing  data  in 
any  useful  fashion.  It  seemed  necessary  to  consider  the  following  aspects  of 
each  behavior  separately; 

1.  Reception  of  information  relevant  to  the  behavior. 

2.  Internal  processing. 

3.  Responding. 

Changes  in  any  of  these  aspects  alter  the  nature  of  the  behavior  unit,  and 
should  be  reflected  in  predictions  relating  to  its  performance. 

These  aspects  generally  fit  the  standard  STIMULUS-ORGANISM-RESPONSE  para¬ 
digm.  A  brief  attempt  was  made  to  define  discrete  units  of  behavior  in  terms 
of  these  aspects.  But,  while  a  good  deal  Is  known  about  vision,  audition. 
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percepticr,  decision  making,  and  psycho-motor  activity,  the  knowledge  as  well 
as  the  ory,  is  not  yet  sufficient  to  handle  practical  problems  at  this 
level,  however,  these  aspects,  translated  into  hardware  terms,  seemed  quite 
practical . 

Generally,  man  is  associated  with  a  machine  input  via  a  control,  and  wi  t^ 
the  output  via  a  display  of  some  sort.  That  is,  the  man's  Input  (stimulus)  i« 
the  machine's  output,  and  the  man's  output  (response)  is  the  machine's  inpu^ 

It  seemed,  therefore,  that  a  careful  study  of  the  sources  of  machine  outputs 
would  provide  the  Information  concerning  the  range  of  stimuli  with  which  men 
would  be  expected  to  cope.  Similarly,  a  study  of  machine  inputs,  essential!/ 
controls,  would  identify  a  majority  of  the  characteristics  of  man's  response. 
Thus,  the  SOR  concept  was  expressed  In  terms  of  the  source  of  the  stimulus, 
gross  mediating  processes,  and  the  mode  and  media  of  the  operator  response. 

Ultimately,  the  framework  for  performance  analysis  involved  four  levels 
of  classification; 

1.  Aspects  of  behavior  which  refer  to  categories  of  Inputs, 
mediating  processes,  and  outputs. 

2.  Components  which  refer  to  a  specific  category  of  an  aspect, 
e.g.,  Joyst'-k  is  a  component  of  the  output  aspect. 

3.  Parameters  which  refer  to  the  relevant  characteristics  of 
components,  e.g.,  stick  length  is  a  parameter  of  the  com¬ 
ponent  joystick. 

4.  Dimensions  which  refer  to  specific  values  of  the  relevant 
parameters,  e.g.,  six  inches  is  a  dimension  of  the  parameter 
stick  length. 

Requi red  Information 

The  information  required  by  this  conceptual  approach  Is  of  several 
varieties.  The  evaluator  must  have  detailed  equipment  information  that  is 
relevant  to  operation.  This  is  generally  restricted  to  design  details  of 
controls  and  displays  and  their  spatial  and  functional  relationships. 
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Detailed  information  is  also  required  about  the  use  of  these  controls  and 
displays.  Essentially,  this  is  operating  Information  contained  in  a  task 
analysis  or  detailed  operating  manual. 

Of  primary  importance  here  is  the  need  for  performance  data  relevant 
to  operation.  That  is,  there  must  be  a  convenient  store  of  information  that 
contains  performance  data  for  any  control  or  display  that  may  be  encountered. 

In  addition,  there  must  be  guidance  materials  to  insure  consistency  of  the 
evaluation  process  and  its  results,  as  well  as  aiding  the  interpretations  and 
use  of  the  results.  Figure  1  presents  this  gross  conceptualization  graphically. 

Limi tations 

The  major  limitation  inherent  in  this  approach  is  that  the  consequences 
of  specific  components  and  parameters  in  interaction  are  unknown.  The  deter¬ 
mination  of  these  effects,  at  the  level  of  detail  required  here,  is  currently 
beyond  the  state-of-the-art.  It  is  assumed  that  int^raction  effects  will  tend 
to  balance  out  so  that  results  of  evaluation  will  not  be  consistently  in  error. 

A  second  potential  limitation  is  due  to  the  reliance  upon  experimental 
data.  Insofar  as  possible,  available  experimental  data  were  used  in  formu¬ 
lating  guidance  for  performance  estimates.  Thus  the  final  procedure,  and  its 
results,  can  only  be  as  good  as  the  data  upon  which  it  is  based.  The  major 
work  is  yet  to  be  done  in  establishing  reliable,  general  standards  of  human 
performance. 

Additional  limitations  of  the  Index  are  inherent  in  the  statement  of 
assumptions  below. 

Assumptions 

To  assure  that  the  Index  achieves  its  stated  purposes,  the  following  con¬ 
ditions  must  be  met  in  applying  the  Index: 

1.  Available  equipment  and  task  information  must  accurately 
describe  the  design  and  operating  characteristics  of  the 
equipment  to  be  evaluated.  Any  change  in  the  design  of  the 
equipment  or  the  allocation  of  operator  responsibilities 
will  alter  the  detailed  evaluation  results  and  may  signi¬ 
ficantly  alter  the  interpretation  of  the  results. 
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Figure  1.  Gross  Schematic  of  the  Operability  Evaluation  Problem 


2.  The  Index  is  intended  for  prediction  of  performance  by  rela¬ 
tively  unselected  personnel  v-'iiO  have  received  only  nominal 
training.  In  most  cases,  rigid  selection  criteria  or  inten¬ 
sive  training  will  result  in  operator  performance  that  is 
faster  and  irore  reliable  than  performance  predicted  by  the 
Index. 

3.  The  Index  should  be  applied  by  a  professional  human  factors 
engineer  or  other  personnel  qualified  to  evaluate  man-machine 
i nteractions. 


9 


DEVELOPMENT  OF  THE  INDEX 


Summary 

Mesiaurable  performance  related  equipment  components,  applicable  to 
any  operating  behavior,  were  identified  and  categorized  in  accordance  with 
the  general  framework.  General  and  experimental  information  pertaining  to 
these  characteristics  and  the  factors,  or  parameters,  affecting  perfor¬ 
mance  were  abstracted  from  the  literature.  This  abstracted  data  was 
related  to  the  categories  of  equipment  components.  A  general  correction 
factor  was  computed  and  applied  to  all  data  to  compensate  for  the  labora-' 
tory  conditions  under  which  they  were  generated.  The  resulting  corrected 
data  was  integrated  and  organized  into  a  store  of  data  for  ready  access. 
Procedures  for  evaluation  based  upon  this  data  and  equipment  and  operatip^ 
information  were  developed.  Scoring  procedures  were  developed  in  detail, 
and  guidelines  to  the  interpretations  of  evaluation  results  for  the  spe¬ 
cified  purposes  were  presented.  Each  major  effort  is  covered  in  more 
detail  below. 


Components  of  Behavior 
Identification  of  Components 

General  performance  related  equipment  components,  or  behavior  compo¬ 
nents,  were  identified.  Various  types  of  equipment  and  equipment  manuals 
were  surveyed.  During  the  survey,  all  controls  nd  displays  observed 
were  noted  In  addition,  the  "thinking"  or  mediating  process  required 
of  operators  was  inferred  and  noted.  The  resulting  lists  were  somewhat 
lengthy,  due  to  numerous  variations  of  a  few  basic  components.  The  cate- 
gorie:;  of  components  selected  for  inclusion  in  the  Index  consisted, 
generally,  of  these  basic,  unique  components.  In  some  cases,  however, 
subsequent  experimental  findings  led  to  a  further  breakdown  of  some  com¬ 
ponents.  For  example,  the  general  component  "scales"  was  subsequently 
broken  down  into  types  of  scales.  These  categories  were  then  related  to 
input,  mediating  process,  or  output  aspects  in  accordance  with  the  basic 
framework.  The  result  of  this  process  appears  in  Table  I  below. 
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Table  I.  List  of  Input,  Mediating  Process,  and  Output  Components 


1 nputs 

Mediating  Process 

Outputs 

Circular  Scales 

1  dent i f i ca  t i on/ ftecogn i t i on 

Cable  Connections 

Counters 

Manipulat ion 

Cranks 

Label i ng 

D i sconnect i ng 

Lights 

Joysticks 

Linear  Scales 

Knobs 

Non-Speech 

Levers 

Scopes 

Object  Positionin< 

Semi-Circular  Scales 

Pushbuttons 

Speech 

Rotary  Selectors 

Speech 

Toggle  Switches 

Writing 

These  components,  and  their  variations,  account  for  most  of  the  important 
sources  of  information  to  the  Operator  (input),  his  treatment  of  this 
information  (mediating  process),  and  the  modes  of  his  responses  (output). 

Parameters  Affecting  Component  Performance 

For  each  component,  the  associated  parameters  or  factors  which  af¬ 
fected  performance  were  identified.  The  approach  here  was  both  rational 
and  empirical.  The  rational  approach  consisted  of  a  careful  consideration 
of  each  component.  The  attempt  was  to  Identify  all  possible  factors  that 
might  affect  the  use  of  a  component.  These  parameters  were  noted  and 
checked  against  the  empirical  approach  results. 

The  empirical  approach  consisted  of  surveying  the  exp<'Mmental  litera¬ 
ture  to  identify  the  dependent  variables  which  had  been  studied.  With 
15-20  years  of  experimentation  as  a  source,  it  was  felt  that  most  signifi¬ 
cant  components  and  factors  had  already  been  identified  and  studied.  Thus, 
the  literature  was  reviewed  and  all  studies  concerned  with  a  given  component 
abstracted.  The  resulting  abstracts  were  summarized  so  thjt  all  the  para¬ 
meters  studied  and  their  consequences  for  performance  couid  be  observed. 
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The  resulting  list  of  parameters  was  then  compared  with  the  parameters 
identified  by  rational  analysis.  The  purpose  of  the  comparison  was  to 
insure  that  all  parameters  of  significance  had  been  studied  nnd  the  data 
abstracted.  Some  parameters  had  apparently  never  been  studied,  and  other* 
were  impossible  to  isolate  or  measure  without  elaborate  equipment.  Where 
there  was  no  data  on  a  parameter,  its  effect  was  judged;  where  the  para¬ 
meter  could  not  be  readily  measured,  gross  notions  as  to  its  dimensions 
were  identified  which  could  be  readily  Judged  by  an  evaluator. 

As  an  example  of  the  kinds  of  parameters  identified.  Table  II  pre¬ 
sents  the  parameters  affecting  performance  on  the  component  "lights." 

Table  II.  Parameters  Affecting  Performance  on  Lights 

Size 

Brightness 

LIGHTS  Type/Function 

Number 

Presentation 


These  parameters  are  believed  to  be  the  most  important  ones  in  terms  of 
their  effect  on  performance,  and  they  can  be  easily  identified. 

An  attempt  was  made  to  incorporate  other  factors,  such  as  situationa 
motivational,  personality  factors,  etc.,  into  the  Index.  This  effort  was 
abandoned,  however,  when  it  became  clear  that  existing  data  was  quite 
contradictory  and  insufficient  for  Index  purposes. 

Performance  Data 

Data  Abstracts 

Performance  data  related  to  the  components  and  parameters  identified 
were  abstracted  from  the  experimental  literature.  Over  the  course  of  the 
f)tudy,  several  thousand  research  reports  were  surveyed.  Of  these,  severa 
hundred  were  selected  for  careful  consideration.  Reports  meeting  the 
following  requirements  were  finally  abstracted. 

1.  Experimental  in  nature, 

2.  Specific  to  type(s)  of  control  or  display,  or  general i zable 
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to  input,  mediating  process,  or  output  aspect  of  behaviori 

3.  Raw  or  grouped  data  presented  with  the  analysis,  rather  than 
simple  reports  of  conclusions. 

4.  Emphasis  on  time  and/or  error  measures,  or  measures  trans¬ 
latable  into  these  terms. 

5.  Well-defined  dimensions  of  controls  and  displays. 

6.  Explicit  statement  of  experimental  method  and  conditions. 

A  total  of  164  research  reports  meeting  these  requirements  were 
abstracted.  References  to  these  reports  appear  as  an  Appendix. 

Two  examples  of  the  kind  of  data  available  in  the  literature,  «nd  the 
way  it  was  extracted,  arc  presented  in  Figure  2. 

Figure  2.  Sample  of  Abstracted  Data 


Example  1 _ _ 

If  a  rotary  knob  is  used  for  making  settings  on  a  linear  scale.  Is  the 
control  friction  approximately  100  grams  (at  periphery  of  knob)? 

If  this  is  not  followed: 

Operation  or  setting  times  will  be  increased  (if  greater  friction  is 
used) . 


Friction 

10/16"  travel 

50/16"  travel 

100  grams 

2.24 

2.92 

400  grams 

2.38 

3.90* 

700  grams 

2.45 

4.45* 

1 ,000  grams 

2.37 

4.81* 

1 ,300  grams 

2.58* 

5.10* 

*Significant  differences  beyond  1%  level  from  underlined  figure  in  column, 
(Jenkins,  L.  J.  J.  Appl .  Psychol . .  1950,  2i») 
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Exatnple  2 _ _ _ _ 

Ss:  20  college  students  v^ith  normal  Snellen  Acuity  and  no  obvious  visual 
defects. 

Task: 

Photographs  of  circular  moving  pointer  dials  were  presented  in  a  slid 
mirror  tachi stoscope.  Viewing  distance  was  28";  viewing  angle  was  varied 
from  90°  to  25°.  Two  types  of  white  on  black  dials  were  used:  (dial  dia¬ 
meter  was  not  given). 

1.  600  unit  dial —  graduation  marks  every  10  units;  50  and  100  mains 
heavier  and  longer;  100  unit  marks  aiso  numbered. 

2.  itOO  unit  dial--graduation  marks  every  10  units;  every  40  unit 
markers  heavier  and  longer  and  numbered. 

Brightness  of  white  markings  was  7  ft.  lamberts. 

S';  were  given  iO  practice  trials.  Each  S  was  given  40  test  tials,  20 
on  each  dial  type,  4  at  each  of  10  viewing  angles,  and  5  in  each  dial  cuad 
rant.  For  each  dial,  half  the  settings  were  near  a  graduation  mark  and 
half  were  near  a  mi -rr.irk  position.  S  controlled  exposure  time;  E  record* 
time  and  the  reading  given  by  S.  Instructions  were  for  S  to  "read  the 
dial  to  the  nearest  5  units  as  accurately  and  quickly  as  possible," 


Per  Cent  Reading  Errors  of  5  Units  or  More  at  Each  Viewing  Angie 
(No.  Ss  »  20. No.  readings  =  4  per  S  at  each  viewing  angle) 


Viewing  angle  (Both  types  dial  combined) 


90° 

80° 

70° 

60° 

50° 

45° 

40° 

35° 

30° 

25° 

%  Readings 
in  Error  by 

5  Units  or 
More 

14.0 

12.5 

15.0 

21.0 

17.5 

16.5 

22.5 

23.5 

20.5 

22,5 

(Data  'Extrapolated  from  graph.) 


Reading  time  showed  no  systematic  change  associated  with  viewing  angl 
with  either  dial . 

(Cohen,  Vanderplas,  t  White,  J.  AppI .  Psychol . .  1953,  iZ.) 

Ideally,  a  comprehensive  collection  of  such  data  could  be  treated  to 
yield  performance  data  for  each  equipment  characteristic,  considering  all 
relevant  factors  which  might  influence  performance.  In  reality,  however, 
abstracted  from  the  literature  is  markedly  lacking  in  consistency  with  res 
to  the  factors  Investigated,  kinds  of  measure  used,  and  experimental  rigor 
Human  engineering  studies,  which  yield  the  most  relevant  information,  are 
minimally  general izable.  Most  of  these  studies  were  conducted  to  answer  c 
specific  questions,  and  there  is  a  marked  lack  of  any  theoretical  framewor 
within  which  such  studies  were  conducted.  Thus,  while  a  great  amount  of 
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relevant  experimental  data  exists,  it  was  difficult  to  bring  the  data,  in 
its  original  form,  to  bear  directly  on  the  general  problem  under  Study. 

There  was,  then,  the  significant  problem  of  reducing  and  integrating 
the  available  data  into  a  form  cc<mpatible  with  an  evaluation  procedure. 

In  approaching  this  problem,  the  performance  measures  used  by  the  various 
investigators  were  transformed  into  consistent  time  and  error  terms.  That 
is,  if  at  all  possible,  dependent  measures  were  expressed  as  per  cent  of 
trials  in  error,  and  time  required  per  trial.  In  only  a  few  cases  was 
this  not  possible.  This  translation  into  consistent  measures  tended  to 
reduce  the  complexity  of  the  massed  data. 

Reduction  of  Abstract -'d  Data 

Reduction  of  the  abstracted  data  was  required  to  reduce  the  mass  of 
apparently  unrelated  information.  Tables  were  prepared  which  summarized, 
for  each  component,  all  available  abstract  information.  These  tables 
presented: 

1.  A  list  of  possible  parameters  for  each  component. 

2.  The  dimensions,  or  specific  values,  of  each  parameter 
studied  and  performance  measures  related  to  each 
dimension. 

3.  The  experimental  conditions  of  each  study  and  the  per¬ 
formance  measures  related  to  each  condition. 

k.  The  number  and  kinds  of  subjects,  number  of  practice  and 
test  trials,  etc.,  and  related  performance  measures. 

Both  discrepancies  in  data  and  the  absence  of  data  could  be  easily 
noted  from  the  tables.  A  majority  of  the  discrepancies  were  found  to  be 
due  to  gross  differences  in  the  mediating  processes  required.  Imposing 
a  consideration  of  these  processes  on  the  data  eliminated  most  of  the 
major  differences.  Those  few  instances  where  this  was  not  the  case  were 
reconciled  by  the  judgment  of  the  project  staff.  The  absence  of  data 
result'd  in  a  specific  search  of  the  literature  to  fill  the  gaps  in  the 
summary  tables.  Where  this  search  failed,  the  data  were  generated  by 
extrapolation  or  interpolation  from  related  studies,  or,  as  a  last  resort, 
by  expert  judgment. 


The  major  reduction  of  data  was  accomplished  by  grouping  together 
some  dimensions  of  the  studied  parameters.  It  was  obvious  that  the  para¬ 
meters  could  not,  and  should  not,  be  presented  at  the  detailed  level  at 
which  they  were  studied.  The  decisions  for  grouping  the  dimensions  intu 
a  workable  number  were  somewhat  arbitrary.  Every  effort  was  made  to  base 
grouping  on  the  statistical  significance  of  differences  found  between 
dimensions.  Where  this  was  not  possible,  the  criterion  of  mean i ngfu 1  nest 
of  dimensions  took  precedence.  An  example  of  how  this  dimensional  gro^.p  • 
ing  was  accomplished  is  presented  in  Table  III  below.  The  data  refers  tt 
control  knobs.  The  parameter  of  concern  in  this  example  is  "size." 


Table  III.  Example  of  Dimensional  Grouping 


Average  Knob  Turning 

Time  (in  Seconds)  Under 

Varied  Shaft  Friction 

Original  Data 

Knob  Diameter 

Moderate  Frict ion 

Heavy  Friction 

1/2 

inch 

1.649 

2.170 

3/4 

i  nch 

1.553 

1.802 

1 

inch 

1.318 

1.585 

1  1/4 

inch 

1.237 

1 .498 

1  1/2 

inch 

1.262 

1.368 

1  3/4 

inch 

1.213 

1.328 

2 

inch 

1.21  1 

1.264 

2  1/4 

inch 

1.208 

1.281 

2  1/2 

inch 

1.256 

1.317 

2  3/4 

inch 

1.245 

1 .430 

3 

inch 

1.292 

1.419 

3  1/4 

inch 

1.275 

1.394 

Grouped  Data 

Widest  Difference 

Widest  Diffct 

Mean 

Between  Original  Means 

Mean  Between  Original 

Operation 

Time 

Actual 

As  %  of  Mean 
Operat ion  Time 

Operat ion 
T  ime 

f — 

Actual 

As  %  of 
Operat ic 

Less 

than  1 

1" 

1 .601 

0.096 

■Km 

0. 1 

1"  to 

less 

than  2" 

1.257 

0.105 

wSSm 

0.  i 

2"  to 

less 

than  3" 

1 .230 

0.048 

o.o4 

HB 

0. 

3"  or 

more 

1.283 

0.017 

0.01 

0.025 

0.( 

From  this  table  it  is  evident  that  little  loss  of  data  occurred  by 
grouping  in  this  manner.  The  range  of  operation  time  included  in  the 
groups  is  a  very  small  percentage  of  the  mean  operation  time  for  a  given 
size  of  knob.  Yet,  differences  in  time  between  knob  sizes  are  apparent. 

In  those  few  cases  where  several  studies  showed  divergent  grouping  ten¬ 
dencies,  final  groupings  were  decided  arbitrarily. 

The  result  of  this  process  was  a  first  approximation  store  of  data 
for  use  during  evaluation.  However,  the  data  at  this  point  was  still 
expressed  in  terms  of  time  per  trial,  and  per  cent  of  trials  in  error. 

In  order  to  isolate  the  contribution  of  a  given  dimension  to  time  and 
error,  further  integration  of  the  data  was  required. 

Data  Integration 

The  most  frequent  case  among  the  studies  abstracted  was  where  two 
or  more  parameters  were  varied  simultaneously.  It  was  necessary  to  deter¬ 
mine  the  general,  but  independent  effects  of  each  dimension  of  every  para¬ 
meter  upon  both  total  time  and  error.  The  integration  procedure  was  as 
f o 1 1 ows . 

The  magnitude  of  the  measures  obtained  in  the  abstracted  studies  is 
dependent  upon  known  dimensions  of  parameters  being  studied,  and  all  vother 
factors  and  conditions,  whether  controlled  or  not.  It  is  reasonable  to 
assume,  however,  that  w  thin  a  given  study,  these  other  factors  remain 
fairly  constant.  Therefore,  differences  in  obtained  measures  may  be 
attributed  to  known  variations  of  dimensions  of  parameters  being  studied. 

A  change  in  conditions  may  be  expected  to  alter  the  magnitude  of  the  mea¬ 
sures,  but  the  differences  attributable  to  known  variations,  relative  to 
the  magnitude,  would  be  expected  to  remain  the  same,  except  as  a  result  of 
their  interaction  with  the  new  condition. 

Time  Estimates.  Time  data  from  a  single  study  on  "rotary  controls"  is 
presented  in  Table  IV  to  serve  as  an  example  of  how  time  consequences  were 
determined. 

Since,  as  stated  above,  the  magnitude  of  the  measures  is  dependent  on 
a  host  of  factors,  most  of  which  are  unknown,  there  is  a  minimal  concern 
with  the  actual  numbers  in  Table  IV.  Differences  in  numbers  attributable 
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Table  IV.  Sample  Data  on  (lotary  Controls 


to  the  dimensions  studied  are  determined  by  taking  differences  between 
means.  That  is,  with  regard  to  size,  the  difference  between  the  means  for 
<1"  and  2-3"  controls  is  .13  seconds.  In  this  case,  we  say  that  the  con¬ 
sequence  of  2-3"  controls,  rather  than  <^l",  over  placement  and  all  other 
unknown  factors  and  conditions,  is  .13  seconds.  The  consequence  of  <ll" 
controls  can  only  be  interpreted  as  zero,  since  in  this  example  it  is  the 
optimum  dimension.  With  regard  to  placement,  the  Front  is  optimum  here, 
and  its  consequence  is  zero.  The  performance  consequence  of  Top  is  .32, 
and  Side  is  .67  seconds.  Since  it  is  most  reasonable  to  establish  consequence 
for  deviations  from  an  optimum,  there  is  no  concern  with  differences  between 
Top  and  Side  placement. 

Optimum  levels  of  performance  for  each  component  were  determined.  That 
is,  a  "base  time"  was  established  for  each  component  which  assumed  all  para¬ 
meters  were  optimum.  This  base  time  was  determined  by  searching  the  ab¬ 
stracts  and  finding  optimum  conditions  for  the  component  under  study.  Given 
a  base  time,  the  consequences  of  non-optimum  dimensions  of  parcmeters  were 
considered  as  time  added  to  the  base. 

Such  consequences  and  base  times  were  determined  for  all  components. 
Where  replications  of  studies  occurred,  mean  consequences  were  established. 
Where  a  given  factor  was  studied  under  significantly  different  conditions, 
the  abstracts  of  the  studies  were  examined.  Generally,  it  was  possible  to 
justify  and  adjust  consequences  based  on  obvious  factors  such  as  subjects 
or  practice  trials,  etc.  In  the  few  cases  where  consequences  appeared 
irreconcilable,  adjustments  were  made  based  on  judgment. 
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Error  Est!mates>  Time  measures  obtained  in  this  manner  seemed  to  be 
reasonable  estimates  of  performance  time  under  actual  operational  condi¬ 
tions.  The  error  estimates,  however,  were  considered  gross  over-estimates 
of  operational  errors,  because  of  artificial  inflation  of  error  counts. 

This  inflation  occurs  because  error  rates  are  normally  relatively  low  in 
operational  task  performance.  In  order  to  have  measurable  error  without 
running  a  very  large  number  of  trials^  experimenters  inflate  errors  by 
making  tasks  unusually  difficult,  or  counting  potential,  or  near,  errors. 
The  derivation  of  some  operational  meaning  from  this  data,  relevant  to  the 
evaluation  problem  required  a  more  devious  approach. 

The  most  meaningful  notion  of  operator  accuracy  for  evaluation  pur¬ 
poses  is  that  of  operator  reliability.  But  this  notion  of  operator  reli¬ 
ability  should  apply  to  the  component  of  behavior.  That  is,  reliability 
measures  should  be  available  for  each  of  the  inputs,  mediating  processes, 
and  outputs.  In  order  to  achieve  this,  the  reliability  contribution  of 
each  dimension  of  every  relevant  parameter  must  be  known.  Ideally,  tne 
reliability  contribution  at  this  level  should  be  determined  empirically, 
However,  this  would  require  a  long  term,  extensive  effort  which  was  far 
beyond  the  scope  of  the  study.  The  Interim  solution  to  this  problem  con¬ 
sisted  of  scaling  the  grossly  inflated  laboratory  error  counts  against 
available  estimates  of  over-all  field  reliability. 

The  over-all  estimates  of  field  reliability  were  obtained  from  pre¬ 
vious  studies  (Miller,  et  al,  I957>  Craig,  et  al,  1957)  •  Over  a 
variety  of  equipments  and  missions,  the  range  of  operator  reliability 
estimates  was  between  8$  and  90  per  cent.  Conversely,  it  may  be  said 
that  10  to  15  per  cent  of  the  time,  operator  error  will  fail  or  seriously 
degrade  mission  effectiveness.  No  field  studies  have  been  conducted  which 
provide  rel iabi 1 ity  estimates  for  individual  task  steps,  behaviors,  or 
behavior  components.  The  best  estimate  at  this  level  seemed  to  be  a  "mean 
mission  step  unreliability"  figure.  Unreliability  was  chosen  for  computa¬ 
tional  convenience  only. 

This  estimate  was  taken  by  determining  the  mean  number  of  steps  in  a 
mission,  and  dividing  by  a  mean  mission  unral iabi 1 i ty  estimate  of  .13. 

The  mean  number  of  steps  was  determined  by  counting  the  required  steps 
for  26  different  equipments.  The  mean  number  was  near  50.  The  mean  mission 
step  unreliability  obtained  In  this  way  was  .0026.  In  different  words, 
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the  best  estimate  seemed  to  be  that  26  times  in  ten  thousand,  an  operator 
error  on  a  given  step  of  operation  would  fail  or  seriously  degrade  mission 
effectiveness. 

This  mean  mission  step  unreliability  was  then  compared  with  an  esti¬ 
mate  of  mean  unreliability  per  experimental  trial.  This  was  determined 
from  the  data  available  from  the  abstracts.  Over  the  data  available,  it 
was  found  tli 't  mean  unreliability  per  trial  was  .31935.  Thus,  there  were 
two  estimates  ■'f  mean  step  unreliability — one  based  on  estimates  of  actual 
field  operation,  one  based  on  laboratory  experimentation.  Assuming  that 
experimental  trials,  are  roughly  equivalent  to  individual  steps  of  operation, 
the  ratio  of  these  means  is  a  reasonable  conversion  factor  for  laboratory 
results.  That  is,  correcting  all  the  experimental  results  by  a  factor  of 

*008145,  compensated  for  the  laboratory  conditions,  and  renderf  d 
the  data  more  compatible  with  field  operation.  The  corrected  unreliability 
figures  were  then  converted  to  conventional  reliability  scores. 

Attributing  effects  to  behavioral  components.  These  reliability  scores, 
based  upon  steps  of  operation  and  experimental  trials,  were  considered 
attributable  to  individual  components  of  behavior.  The  justification  for 
this  is  that,  in  an  experiment  involving  an  input,  every  effort  is  typically 
made  to  reduce  error  or  unreliability  due  to  mediating  and  output  aspects  of 
the  behavior  to  an  absolute  minimum.  Expressed  in  other  terms,  the  reli¬ 
ability  of  a  behavior  is  dependent  upon  the  rei iabi 1 ities  of  the  aspects  of 
behavior.  In  experimental  studies  Involving  one  aspect,  the  reliabilities 
of  the  other  aspects  are  made  to  approximate  unity.  When  time  is  the  depen¬ 
dent  measure  for  an  aspect  under  study,  the  time  attributable  to  the  other 
aspects  is  held  to  a  minimum.  Thus,  attributing  time  and/or  reliability, 
as  corrected  by  the  above  factor,  to  a  single  component  s  ems  reasonable. 

Organization  of  the  Data  Store 

Organization  of  the  data  into  a  convenient,  accessible  form  was  possible 
with  the  treated  data.  The  result  of  this  organization  was  the  Data  Store  of 
the  Index.  Figure  3  presents  the  data  as  it  was  finally  presented  in  the 
Data  Store,  Individual  card(s)  were  prepared  for  each  component.  On  this 
card,  the  parameters  relevant  to  the  component  were  presented.  The  dimen¬ 
sions  associated  with  each  of  the  parameters  were  listed  with  the  associated 
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data.  The  base  time  for  each  component  appears  at  the  top  of  the  time 
column.  This  base,  as  mentioned  previously,  serves  as  an  absolute  minimum 
time  for  behaving  with  the  component.  This  time  will  hold  only  if  all  the 
parameters  listed  are  of  optimum  dimension,  e.g. ,  add  no  time.  Other  dimen¬ 
sions  will  add  time  to  the  base.  Reliability  estimates  are  presented  for 
each  parameter  dimension. 


Figure  3. 

Sample  Data  Store  Card 

JOYSTICK 

(May 

move 

in  many  planes) 

BASE  TIME  = 

1  .33 

Time  added 

Reliability 

1 . 

Stick  length 

1.50 

.9963 

a.  6-9" 

0 

.9967 

b.  12-i8" 

1.50 

.9963 

2. 

c.  21-27" 

Extent  of  stick  movement  (Extent  of 
movement  from  one  extreme  to  the  other 
in  a  single  plane.) 

0 

.9981 

a.  5-20° 

.20 

.9975 

b.  30-40° 

.50 

.9960 

3. 

c.  40-60° 

Control  resistance 

0 

.9999 

a.  5-10  lbs. 

.50 

.9992 

4. 

b.  10-30  lbs. 

Support  of  operating  member 

0 

.«^990 

a.  Present 

1.00 

.9950 

b.  Absent 

5. 

Time  delay  (Time  lap  between  movement 
of  control  and  movement  of  display.) 

0 

.9967 

a.  .3  sec. 

.50 

.9963 

b.  .6-1 .5  sec. 

3.00 

.9957 

c.  3.0  sec. 

Information  Required  for  Evaluation 

Two  general  types  of  information  concerning  the  equipment  to  be  evalu¬ 
ated  must  be  obtained  before  the  Index  can  be  successfully  applied. 

Equipment  I nformat ion 

Data  concerning  the  equipment  should  include  detailed  information  about 
the  controls  and  displays.  If  prototype,  pre-prototype,  or  mocivup  equipment 
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is  available  for  evaluation,  this  informat i'^n  can  be  obtained  directly. 

If  an  evaluation  is  to  be  conducted  prior  to  mock-up,  then  the  level  of 
detail  required  may  present  a  problem.  The  Data  Store  described  above 
indicates  the  nature  of  the  equipment  information  required. 

Operat i nq  I nformat ion 

This  information  is  concerned  with  what  an  operator  must  do  with 
the  controls  and  displays.  Most  of  the  information  will  be  contained  in 
a  good  task  analysis,  or  in  detailed  operating  manuals,  or  may  be  supplied 
by  an  expert  on  operation  of  the  equipment.  In  all  cases,  however,  some 
of  the  information  must  be  inferred.  While  the  input  and  output  of  opera¬ 
tion  are  aimost  aiways  easily  specified,  the  mediating  processes  involved 
in  operation  must  be  inferred.  At  the  levei  of  concern  here,  however,  ih 
inferences  are  not  difficult  to  make. 

Guidance  Materials  and  the  Evaluat ion  Process 

Given  the  Index  Data  Store  and  the  information  required  concerning 
equipment  and  its  operation,  the  evaluation  process  becomes  essentially 
that  of  matching  the  information  with  the  data.  The  guidance  materials 
developed  are  detailed  instructions  to  guide  this  process.  In  addition, 
instructions  are  presented  for  scoring  the  Index,  and  interpreting  the 
results. 

The  application  of  the  index  requires  the  completion  of  six  major  ste 
or  processes.  These  steps  are  listed  briefly  below.  Detailed  Instruction 
for  each  step  are  contained  in  the  Index  Instruction  Manual. 

1.  Organize  Equipment  and  Operating  Information.  Data  obtained 
from  task  analyses  and  ocher  sources  must  be  analyzed  into 
behavioral  steps  and  sequenced  by  mission  phases  of  operation. 

2.  Coliect  Evaluation  Data.  This  step  includes  the  identifica¬ 
tion  of  relevant  components,  parameters,  and  dimensions  for 
each  step,  matching  these  values  with  the  data  in  the  Data 
Store,  and  entering  the  appropriate  values  on  an  Evaluation 
Sheet. 
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3-  Score  Evaluation  Sheet.  Step  scores  are  computed  for  each 


aspect  of  behavior  and  across  aspects  for  total  step  scores 
by  adding  together  the  relevant  time  entries  and  multiplying 
together  the  reliability  estimates.  These  totals  are  entered 
on  the  Evaluation  Sheet. 

4.  Summarize  Results  by  Mission  and  Phase.  Total  values  for 
each  phase  of  a  mission  and  for  the  total  mission  are  com¬ 
puted  from  the  data  on  the  Evaluation  Sheet,  similar  to 
the  method  for  obtaining  step  totals.  The  results  of  this 
summary  are  entered  on  the  Mission  and  Phase  Summary  sheet.' 

5.  Summarize  Results  by  Component.  Total  values  for  each  com¬ 
ponent  of  the  input,  mediating  process,  and  output  aspects 
of  behavior  are  computed  across  the  steps  of  each  phase  of 
the  mission.  The  values  are  entered  on  the  Component 
Summary  Form. 

6.  Derive  Recommendations.  Based  on  the  summarized  results  of 
the  evaluation  listed  above,  recommendations  may  be  developed 
in  the  following  three  areas: 

a.  Redesign.  Redesign  recommendations  are  based  on 
consideration  of  total  component  scores  on  the  Com¬ 
ponent  Summary  Form  and  selection  of  alternate  dimen¬ 
sions  from  the  information  contained  in  the  Data  Store 
to  improve  potential  operator  performance. 

b.  Training.  Training  recommendations  will  be  based  on 
analysis  of  the  Component  Summary  Form  and  will  identify 
aspects  of  performance  that  should  be  given  special 
attention  in  the  training  of  operators. 

c.  Select  ion.  Selection  recommendations  will  be  based  on 
identification  of  aspects  of  behavior  which  contribute 
significantly  to  total  mission  scores  on  the  Mission 
and  Phase  Summary  Form.  These  aspects  may  then  be 
related  to  general  selection  requirements  for  operators 
emphasizing  these  aspects. 
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Summary  Description 

A  graphical  summary  of  the  basic  evaluation  process  involved  in  the 
Index  is  presented  in  Figure  4.  Essentially,  the  individual  steps  of  opera¬ 
tion  are  analyzed  in  their  component  parts.  Scores  for  these  components  are 
determined  with  the  aid  of  the  Data  Store.  The  component  scores,  and  betweer 
step  time  scores,  are  then  combined  into  step  scores.  The  step  scores  can 
then  be  combined  in  various  ways  to  yield  total  aspect,  phase,  and  mission 
scores.  Total  scores  for  specific  components  are  taken  from  the  general  copv 
ponent  scores.  This  array  of  quantitative  information  of  different  levels 
can  then  be  used  to  guide  decisions  and  recommendations  concerning  the  equip 
ment  evaluated. 
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Figure  Graphical  Summary  of  Che  Basic  Evaluation  Process 


INDEX  TRYOUT  AND  RESULTS 


During  September  1961,  a  field  tryout  of  the  Index  was  conducted.  The 
tryout  consisted  of  evaluation  and  reporting  on  four  equipments.  They  were 

1.  AN/GRC-50  Mobile  UHF  Radio  Relay  Equipment 

2.  AN/APS-94  Airborne  Radar  System 

3.  M-33  Anti-Aircraft  Fire  Control  System 

4.  AN/MLQ-S  (XL-2)  Electronic  Countermeasures  Set 

The  eva luat ion  mater ial s  and  reports,  including  recommendations,  are  con¬ 
tained  in  the  Sample  Equipment  Evaluations  Report. 

The  field  tryout  of  the  Operability  Index  had  four  major  goals.  Tiu  y 

were : 

1.  Evaluate  the  extent  to  which  the  Index  can  be  applied  to 
different  equipments  (versatility). 

2.  Determine  the  consistency  of  time  and  reliability  scores 
derived  by  different  evaluators  for  the  same  equipment 
(rel iab i 1 i ty) . 

3.  Determine  the  extent  to  which  the  Index  reflected  the 
known  operability  of  the  equipment  undergoing  evaluation 
(val idi ty) , 

4.  Determine  the  effectiveness  of  the  evaluation  data  in 
diagnosing  problems  in  the  area  of  equipment  design,  sel¬ 
ection,  and  training  (utility). 

Versat il i ty 

In  order  to  assess  <ts  versatility,  the  Index  was  applied  to  equipment 
which  varied  in  terms  of  operating  requirements.  For  example,  the  AN/GRC-5 
is  operated  by  one  man,  and  chiefly  involves  aligning  and  adjusting  the 
equipment.  The  M“33  is  a  multi-man  operation  concerned  with  the  manual  ac' 
quisition  of  targets.  A  unique  aspect  of  operating,  in  terms  of  the  Index, 
was  the  antenna  erection  required  with  the  AN/MLQ-8.  The  AN/APS-94  was 
significant  in  that  the  control  panels  were  largely  miniaturized. 
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In  all  cases,  the  equipment  evaluations  were  accomplished  with  little 
or  no  difficulty.  The  behavior  components,  parameters,  and  dimensions  of 
the  Data  Store  were  both  inclusive  and  definitive  enough  to  assure  an  effec¬ 
tive  evaluation.  Subsequent  analysis  of  the  evaluations  pointed  out  certain 
areas  and  aspects  of  the  evaluation  materials  that  could  be  revised  to  fur¬ 
ther  enhance  the  versatility  of  the  Index.  These  revisions  were  chiefly 
concerned  with  Improving  definitions  of  terms  used  in  the  Index. 

As  a  consequence  of  the  tryout  and  revisions  made,  it  is  believed  that 
the  Index  is  sufficiently  versatile  to  be  applicable  to  a  wide,  if  not 
exhaustive,  range  of  electronic  equipments. 

Reliability 

The  evaluations  conducted  during  the  tryout  were  performed  by  two 
members  of  the  project  staff  and  two  civilian  employees  of  the  U.  S.  Army 
Electronic  Proving  Ground.  An  engineer  from  the  Electronic  Warfare  Depart¬ 
ment  evaluated  two  equipments,  a  Human  Factors  Specialist  with  the  Signal 
Communications  Department  evaluated  one  equipment.  These  two  individuals 
had  no  familiarity  with  the  Index,  except  that  gained  from  the  Instruction 
Manual  and  their  own  experience  during  evaluation.  Members  of  the  project 
staff,  well  trained  in  the  use  of  the  Index,  evaluated  all  four  equipments. 
Table  V  presents  a  summary  of  the  evaluation  results  in  terms  of  total 
scores  for  each  evaluator  for  each  equipment  evaluated. 

Table  V.  Total  Index  Scores  for  Individual  Evaluators 
Equipment  Score  Evaluators 


A 

B 

C 

D 

AN/GRC-50 

Time 

Reliability 

990.91 

.47 

1206.11 

.46 

H-33 

Time 

Reliability 

487.90 

.82 

194.74 

.81 

191.80 

.81 

AN/MLQ-8  (XL-2) 

Time 

Reliability 

174.52 

.96 

166.22 

.96 

202.38 

.96 

AN/ APS -94 

Time 

Reliability 

98.08 

.96 

73.40 

.95 

140.95 

.96 
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The  evaluation  of  the  AN/GRC-50  was  considered  as  a  preliminary  tryoi 
of  the  Index.  Based  on  the  results  of  this  one  evaluation,  significant 
changes  in  the  Index  were  made.  A  majority  of  these  changes  were  procedur 
in  nature,  based  on  concepts  developed  during  the  evaluation.  Changes  in 
Index  materials  were  also  made,  but  were  generally  minor  in  nature.  The 
revisions  made,  however,  were  numerous  enough,  and  of  such  significance 
that  the  evaluations  of  the  other  equipments  were  entirely  different  in 
nature  from  the  evaluation  of  the  AN/GRC-50.  Since  time  prevented  a  re- 
evaluation  of  this  equipment,  it  does  not  enter  in  the  detailed  analysis 
of  results  which  follows. 

Estimates  of  I nter-Rater  Reliability 

inter-rater  reliability  estimates  were  computed  over  the  three  evali 
tions  selected  for  detailed  analysis.  These  estimates  were  based  on  the 
rank  order  of  both  phase  and  aspect  scores.  Reliability  estimates  were 
computed  separately  for  both  time  and  reliability  scores.  The  results  or 
this  analysis  are  summarized  in  Table  VI  below.  More  detailed  tables 
relating  to  each  of  the  entries  below  appear  in  Appendix  B. 

Table  VI.  Summary  of  Inter-Rater  Reliability  Estimates 
for  Phase  and  Aspect  Scores.  (Median 
reliability  for  three  evaluators). 


Equipment 

Level 

Rel iabil ity  of 
Time  Scores 

Rel iabi 1 ity  of 
Reliability  Scores 

All 

Phase 

.8$ 

.96 

M-33 

Aspect  and  Phase 

.72 

.92 

AN/ML(i-8 

Aspect  and  Phase 

.86 

.94 

AN/APS-94 

Aspect  and  Phase 

.75 

.94 

The  first  entry  above  considers  only  total  phase  scores,  without  reg, 
to  equipment  type.  The  remaining  entries  are  based  upon  comparisons  of 
total  aspect  scores, (input,  mediating  process,  output),  within  phases  of 
operation  for  each  item  of  equipment.  This  increases  the  number  of  com¬ 
parisons  possible,  and  thus  yields  a  more  sensitive  estimate  of  relia¬ 
bility. 
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Per  Cent  Agreement  of  Rat i nqs 

To  further  assess  the  reliability  of  the  Index,  the  per  cent  agreement 
among  evaluators  at  the  most  microscopic  level  possible  was  determined.  That 
is,  the  evaluation  consisted  in  identifying  components,  selecting  relevant 
parameters,  end  determining  appropriate  dimensions  of  the  parameters.  If  the 
same  dimensions  were  always  chosen,  the  scores  would  be  identical,  since  the 
data  is  related  to  dimensions.  St  was  at  the  dimensions  level  that  evaluators 
were  compared.  First,  the  total  number  of  dimensions  used  in  an  evaluation 
was  determined.  Then,  the  number  of  times  all  three  evaluatois  chos.e  the  same 
dimension  was  determined.  The  comparison  of  these  two  numbers  was  expressed 
as  "per  cent  agreement."  This  information  is  presented  ►.  ilow  in  Table  VII 
for  each  equipment  and  across  all  equipments.. 


Table  VII.  Per  Cent  Total  Agreement  Among  Three  Evaluators 


Equipment 

1 

Total  Entries 

Total  Number 
of  Aqreements 

Percentage 

Agreement 

AN/APS-94 

427 

379 

89% 

M-33 

1023 

894 

8n 

AM/Ml.Q-8 

425 

271  1 

64% 

All 

1875 

1544  1 

( 

82% 

Val  iditv 

The  construct  validity  of  the  Index,  I.e.,  its  measurement  of  factors 
criticel  to  operator  performance,  seems  assured,  ir-dex  scores  are  a  func¬ 
tion  of  factors  characterized  by  an  experimentally  demonstrated  relationship 
to  perforntance  time  and  operator  error  in  system  operation.  Content  validity, 
or  the  extent  to  which  the  content  of  the  Index  samples  factors  related  to 
operational  complexity,  cannot  be  so  easily  demonstrated.  The  literature 
survey  leading  to  the  identi f 'cat  ion  of  relevant  components,  parameters, 
and  dimensions  in  the  Data  Store  was  both  systematic  and  comprehensive. 
However,  it  cannot  be  established  that  all  of  the  critical  factors  rele¬ 
vant  to  operation  of  electronic  equipment  have  been  considered  in  the 
experimental  literature. 
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The  critical  issue  is,  of  course,  how  wet  I  predictions  of  operator 
performance  based  on  the  Index  conform  to  actual  performance  in  an  opera¬ 
tional  situation.  Due  to  practical  limitations  involved  in  the  conduct 
of  the  study,  it  was  not  possible  to  obtain  statistical  measures  of  the 
predictive  validity  of  the  Index.  Attempts  were  made  by  both  the  research 
staff  and  U.  S.  Army  Electronic  Proving  Ground  personnel  to  obtain  data  on 
actual  operator  performance  with  the  equipments  evaluated.  These  attempts 
were  hampered  by  the  relative  unavailability  of  operators  and  their  equip¬ 
ment,  the  time  and  technical  difficulties  Involved  in  obtaining  microscopic 
time  measures,  and  the  limited  number  of  qualified  observers. 

In  spite  of  these  difficulties,  a  few  performance  times  were  obtained, 
which  were  compared  with  predicted  performance  times  generated  through 
application  of  the  Index.  This  comparison,  though  too  limited  in  scope 
for  statistical  analysis,  did  provide  some  interesting  information  regard¬ 
ing  the  extent  to  which  the  Operability  Index  can  predict  performance  time. 

First,  predicted  times  seem  to  be  much  more  accurate  for  behaviors 
involving  the  use  of  controls  and  displays  than  for  more  gross  manipula¬ 
tions,  such  as  cable  connections  or  antenna  erection.  Predicted  times  for 
control  panel  operations  were  almost  always  in  close  agreement  with  actual 
performance  times.  Observed  times  for  gross  manipulations,  however,  often 
were  as  much  as  three  times  larger  than  predicted  times.  Informal  observa¬ 
tion  during  the  tryout  did  suggest  that  there  is  tremendous  variability  in 
performance  time  associated  with  gross  behaviors. 

Another  interesting  indication  was  that  observed  performance  time  was 
almost  always  greater  than  predicted  time.  This  trend  seemed  constant  even 
in  control  panel  operations  with  experienced  personnel,  even  though  the 
differences  between  predicted  and  observed  time  was  small. 

It  Vies  not  p<>ssible  to  obtain  formal  data  concerning  operator  error 
during  the  tryout.  The  actual  observation  of  operator  error  was  impractical 
within  the  scope  of  the  study  due  to  the  infrequency  of  errors.  Following 
the  tryout,  however,  an  attempt  was  made  to  assess  the  opinion  of  experi¬ 
enced  operators  and  their  supervisors  with  regard  to  the  relative  potential 
error  associated  with  each  of  the  four  evaluated  equipments.  Forms  and 
instructions  for  the  ranking  of  each  of  the  four  equipment  items  were 
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prepared  by  the  staff  and  taken  Into  the  field  by  site  personnel.  Unfortunately, 
only  one  operator  could  be  located  who  had  experience  with  more  than  one 
equipment,  and  this  individual  was  familiar  with  only  two  systems. 

It  was  possible  to  obtain  a  gross,  rational  estimate  of  the  general 
operational  complexity  of  each  of  the  evaluated  equipments  from  the  users. 

Users  of  the  equipment  expressed  some  opinion  as  to  the  difficulty  of  their 
system  as  compared  with  other  units  which  they  had  operated  previously. 

These  users  were  primarily  technical  supervisory  personnel  with  several 
years  experience  in  operating  the  equipment  under  a  wide  range  of  conditons. 

It  is  felt  that  the  qualifications  of  these  personnel,  their  unanimity  of 
opinion,  and  the  gross  design  differences  between  the  evaluated  equipments 
provide  ample  justification  for  the  ranking  of  these  equipments  according 
to  their  operational  complexity.  As  can  be  seen  from  Table  VIII,  the  total 
time  and  reliability  scores  from  the  Index  agree  well  with  this  ranking. 


Table  VIII.  Indication  of  Index  Validity 


Equipment  (Ranked 
in  order  of  judged 
complexity 

Mean  Index 

Time  In  Seconds 

;  Scores 

Reliability 

AN/GRC-50 

(Judged  most  complex) 

1098.51 

,46 

M-33 

288.48 

.81 

AN/MLQ.-8 

181.04 

.96 

AN/APS -94 

104.14 

.96 

Thus,  although  a  formal,  statistical  estimate  of  Index  validity  was  not 
possible  within  the  scope  of  the  study,  the  available'  information  suggests 
that  the  Index  does  provide  a  reasonably  valid  appraisal  of  the  operc*-ing 
complexity  of  electronic  equipment.  Adequate  validation,  however,  yet 
remains  to  be  done. 


Uti 1  ity 

The  Sample  Equipment  Evaluations  Report,  devoted  to  reports  on  the 
evaluations  of  all  four  equipments,  demonstrates  the  utility  of  the  Index. 
The  major  uses  of  the  evaluation  results  are  briefly  described  below. 
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Acceptab i 1 i ty  of  the  level  of  complexity  of  equipment  is  reflected  by 
the  total  time  and  reliability  estimates  provided  for  the  over-all  operat¬ 
ing  sequence.  These  estimates  provide  information  for  deciding  upon  the 
acceptability  of  the  given  human  engineering  design  of  evaluated  equipment. 
The  decisions  anticipated  will  be  two-fold;  1)  can  required  operations 
usually  be  performed  with  the  time  expected  to  be  available,  and  2)  is 
operator  reliability  sufficient  for  the  intended  mission  of  the  equipment? 
The  necessity  of  these  decisions  should  be  emphasized.  The  results  of  thr: 
Index  are  not  end  processes;  they  must  be  interpreted  in  the  light  of  the 
total  equipment  context  and  the  mission  the  equipment  is  to  achieve.  Low 
reliability  and  high  time  scores  are  neither  good  nor  bad,  in  and  of  the--  - 
selves.  To  strive  for  .99  reliability  for  all  equipments  and  missions  is 
senseless.  The  reasonable  approach  is  to  strive  for  enough  reliability  to 
meet  the  purpose  of  the  equipment,  and  no  more.  The  Index  cannot  make  Hi' 
decisions.  It  can  only  facilitate  the  decisions  by  providing  information 
of  direct  relevance. 

Redesign  alternatives  are  reflected  by  the  acceptability  of  the  exisf 
ing  design,  and  the  potential  for  enhancing  the  acceptability  by  human 
engineering  redesign.  Assuming  that  redesign  is  to  be  seriously  consider** 
the  Index  scores  arc  sufficiently  diagnostic  that  recommendations  can  be 
made  in  detail,  and  quantitatively  justified  in  terms  of  enhanced  accept¬ 
ability  of  the  equipment  in  general. 

Selection  and  training  of  operators  may  be  either  an  alternative  to 
redesign,  or  may  be  a  separate  consideration.  Selection  and  training  rccc 
mcndations  never  fully  compensate  for  design  inadequacies  that  are,  in  par 
responsible  for  the  complexity.  However,  whether  or  not  redesign  Is  a  cor 
sideration,  Index  results  can  provide  information  relevant  to  the  selectir 
and  training  of  operators  so  that  actual  operating  performance  will  be  be 
than  the  Index  estimates  indicate. 
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RECOMMENDATIONS 


Versat i 11  tv 

It  is  unfortunate  that  the  Index  could  not  be  tried  out  on  equipment 
still  in  the  developmental  cycle.  It  is  at  this  stage  where  Index  results 
would  be  most  beneficial,  and,  in  fact,  is  the  stage  at  which  evaluation 
with  the  Index  was  intended.  It  was  also  hoped  that  a  broader  sample  of 
equipment  could  be  evaluated.  Plans  were  made  toward  this  end,  but  the 
equipment  was  not  available  at  the  time  of  the  tryout.  Although  current 
evidence  seems  to  assure  the  versatility  of  the  Index,  it  is  recommended 
that  further  tryouts  be  conducted  on  a  variety  of  equipments,  and  at 
various  stages  of  the  developmental  cycle. 

Rel iabi I i tv 

The  current  evidence  clearly  suggests  that  the  Index  is  a  reliable 
evaluation  tool.  However,  the  number  of  equipments  and  evaluators  avail¬ 
able  was  too  limited  to  insure  that  the  reliability  figures  presented  are 
accurate.  Also,  the  figures  reported  are  restricted  to  inter-evaluator 
reliability.  It  would  be  highly  desirable  to  follow  this  up  with  test- 
retest  reliability  estimates  for  a  number  of  equipments  using  an  adequate 
number  of  evaluators. 


Val idltv 

It  is  in  this  area  that  the  current  study  is  most  restricted.  Although 
the  evidence  is  limited,  it;  would  seem  that  the  evaluation  procedure  is 
valid,  and  that:  it  orders  equipment  in  terms  of  complexity  in  agreement  with 
expert  judges.  The  actual  validity  of  results,  the  accuracy  of  the  time  and 
reliability  scores,  is  not  established.  Some  differences  in  Index  scores 
and  actual  measures  were  obtained.  The  information  was  insufficient  to 
determine  whether  this  was  a  consistent  or  sporadic  difference.  Consistent 
differences,  if  they  exist,  could  be  easily  remedied  by  scaling  the  scores 
to  compensate  for  the  differences.  Sporadic  differences,  unless  they  relate 
to  inherently  variable  casks,  could  probably  be  eliminated  by  altering 
appropriate  instructions.  Neither  approach  is  called  for  on  the  basis  of 
available  information. 
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Rigorously  assessing  the  validity  of  the  Index  results  against  ade¬ 
quate  measures  of  field  operation  is  of  primary  importance  to  the  use  and 
further  development  of  the  Index.  Lacking  this  validation,  the  Index  will 
remain  a  tentative  procedure,  even  though  based  on  the  best  a  priori 
information  currently  available. 


Util ity 

While  the  utility  of  the  Index  seems  assured,  it  was  not  possible  to 
investigate  this  characteristic  with  the  various  personnel  and  agencies  who 
will  make  use  of  Index  results.  The  information  provided  by  the  Index 
seems  to  be  uniquely  useful  for  a  variety  of  purposes.  Whether  this  infor¬ 
mation  would,  in  fact,  be  useful  to  testing,  training,  and  personnel  spe¬ 
cialists  remains  in  question.  It  is  conceivable  that  the  information 
provided  is  of  the  wrong  sort,  at  the  wrong  level,  or  expressed  in  the 
wrong  terms  to  be  maximally  useful.  It  is  equally  likely  that  there  are 
untapped  sources  of  information  in  the  Index.  Clearly,  the  utility  of  the 
Index  should  be  determined  in  association  with  the  people  who  have  need  for 
the  information  provided. 
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APPENDIX  B 


RANK  ORDER  RELIABILITY  TABLES 


Table  I.  Phase  Level  Comparison  Over  Two  Equipments 


! 


A.  Time  Scores 


Phases 


AN/APS -94 

Chief  Radar  Operator 
Computer  Operator 
Ranqe  Operator 
Azimuth  Operator 
Elevation  Operator 


Rank  Order 
by  Evaluators 

A  B  £ 

2  I  1 

3  2  2 

1  3  3 

4  5  4 

5  4  5 

6  6  6 


AB  =  .7714 
AC  =  .8286 
BC  «  .9429 


B.  Reliability  Scores 


Median 


.86 


AN/APS -94  5 
Chief  Radar  Operator  4 
tunputer  Operator  6 
Range  Operator  2.5 
Azimuth  Operator  2.5 
Elevation  Operator  1 


4.5  4  AB  -  .9857 

4.5  5 

66  AC  »  .9429  .96 

2.5  2.5 

2.5  2.5  BC  -  .9857 

1  1 
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Table  11.  Aspect  and  Phase  Level  Comparison  for  the  H-33 


A.  Time  Scores 


Rank  Order 

Aspect  or  Phase  by  Evaluator 

ABC 


Total 


1  nput 

3 

2 

2 

Mediating  Process 

1 

12 

13 

Output 

4 

1 

1 

Chief  Radar  Operator 

Input 

5 

3 

3 

Mediating  Process 

15.5 

15 

15 

Output 

7 

5 

4 

Azimuth  Operator 

Input 

9 

7 

8 

Mediating  Process 

15.5 

16 

16 

Output 

10 

8 

9 

Range  Operator 

Input 

11 

10 

11 

Mediating  Process 

17 

17.5 

17.5 

Output 

8 

6 

6 

Elevation  Operator 

Input 

14 

3 

12 

Mediating  Process 

18 

17.5 

17.5 

Output 

13 

9 

10 

Computer  Operator 

Input 

12 

11 

7 

Mediating  Process 

2 

14 

14 

Output 

6 

4 

5 

AB  -  .5470 
AC  -  .6314 
BC  -  .8927 


Median 


.72 


Table  II.  Continued 


B.  Reliability  Scores 
Rank  Order 


Aspect  or  Phase 

by  Evaluator 

/O 

A 

B 

c 

Tota  1 

1  nput 

18 

18 

18 

Mediating  Process 

13.5 

5.5 

5.5 

AB  -  .8457 

Output 

17 

17 

17 

Chief  Radar  Operator 

AC  -  .8457 

Input 

Mediating  Process 

6.5 

13 

13 

BC  -  1.0000 

6.5 

5.5 

5.5 

Output 

13.5 

15 

15 

Azimuth  Operator 

Input 

6.5 

5.5 

5.5 

Mediating  Process 

6.5 

5.5 

5.5 

Output 

6.5 

5.5 

5.5 

Range  Operator 

Input 

6.5 

5.5 

5.5 

Mediating  Process 

6.5 

5.5 

5.5 

Output 

6.5 

13 

13 

Elevation  Operator 

Input 

6.5 

5.5 

5.5 

Mediating  Process 

6.5 

5.5 

5.5 

Output 

6.5 

5.5 

5.5 

Computer  Operator 

Input 

16 

16 

16 

Mediating  Process 

6.5 

5.5 

5.5 

Output 

15 

13 

13 

Table  III. 


Total 

Input 

Mediating  Process 
Output 

Erect  Antenna 
Input 

Mediating  Process 
Output 
Operate 
Input 

Mediating  Process 
Output 


Total 

Input 

Mediating  Process 
Output 

Erect  Antenna 
Input 

Mediating  Process 
Output 
Operate 
Input 

Mediating  Process 
Output 


Aspect  and  Phase  Level  Comparison 
for  the  AN/MLQ-8  (XL-2) 


A.  Time  Scores 


Rank  Order 
by  Evaluator 


4.5  4.5  6.5  AB  -  .9762 

6  6  4 

,  ,  ,  AC  -  .7381  .86 

BC  -  .8095 

omit  omit  omit 
8  7  5 

2  2  2 


4.5  4.5  6.5 

7  8  8 

3  3  3 


Rel lability  Scores 


AB  -  .8750 

AC  -  .8750  .94 

BC  -  1.0000 

3.5  3  3 

3.5  3  3 

3.5  6.5  6.5 


3.5  3  3 


3.5 

3 

3 

8 

8 

8 

omit 

omit 

omit 

3.5 

3 

3 

7 

6.5 

6.5 

B-4 


Table  IV.  Aspect  and  Phase  Level  Comparison 

for  the  AN/APS-94 


A.  Time  Scores 


Rank  Order 


Phase 

bv  Evaluator 

/O 

Median 

A 

B  C 

Total  only 

Input 

2 

1  2 

AB  -  .5000 

Mediating  Process 

3 

3  3 

AC  -  1.0000 

.7500 

Output 

B. 

1 

Rel iabil 

2  1 

itv  Scores 

BC  -  .5000 

Total  only 

Input 

1.5 

2  1.5 

AB  -  .8750 

Mediating  Process 

1.5 

1  1.5 

AC  -  .10000 

.9375 

Output 

3 

3-  3 

BC  "  .8750 

B-5 


